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The detection of unmodeled gravitational-wave bursts by ground-based interferometric 
gravitational-wave detectors is a major goal for the advanced detector era. These searches are 
commonly cast as pattern recognition problems, where the goal is to identify statistically signifi¬ 
cant clusters in spectrograms of strain power when the precise signal morphology is unknown. In 
previous work, we have introduced a clustering algorithm referred to as “seedless clustering,” and 
shown that it is a powerful tool for detecting weak long-lived (~10-1000s) signals in background. 

However, as the algorithm is currently conceived, in order to carry out an all-sky search on a ~ 
year of data, significant computational resources may be required in order to carry out background 
estimation. Alternatively, some of the sensitivity of the search must be sacrihced to control com¬ 
putational costs. The sensitivity of the algorithm is limited by the amount of computing resources 
due to the requirement of performing background studies to assign significance in gravitational-wave 
searches. In this paper, we present an analytic method for estimating the background generated by 
the seedless clustering algorithm and compare the performance to both Monte Carlo Gaussian noise 
and time-shifted gravitational-wave data from a week of LIGO’s 5th Science Run. We demonstrate 
qualitative agreement between the model and measured distributions and argue that the approxi¬ 
mation will be useful to supplement conventional background estimation techniques for advanced 
detector searches for long-duration gravitational-wave transients. 

PACS numbers: 95.75.-z,04.30.-w,07.05.Bx 


I. INTRODUCTION 

Second-generation gravitational-wave detectors such 
as Advanced LIGO [T] and Advanced Virgo [5] will be 
coming online in the coming months and years. Some 
searches for gravitational-wave bursts seek to detect sig¬ 
nals lasting less than «ls |3]. Other searches target sig¬ 
nals lasting ~ 10-1000 s. Compact binary coalescences 
of black holes (and/or neutron stars) are one example of 
long-lived gravitational-wave sources [1H6]. Theoretically 
uncertain models exist for more exotic sources of long- 
lived transients, including emission from rotational in¬ 
stabilities in protoneutron stars [THin] and black-hole ac¬ 
cretion disk instabilities [HHISl. When a matched filter 
search is not possible, searches for long-lived bursts m- 
m can be employed. 

Searches for compact binary coalescences rely on per¬ 
forming time-slides of single detector “triggers,” gener¬ 
ated by performing a matched filter (in the compact bi¬ 
nary coalescence case) on single-detector time series data. 
Burst searches instead use clustering algorithms on single 
detector time-frequency maps [20] or multi-detector co¬ 
herent maps [Ill [211 [22] . Calculating the coherent statis¬ 
tic takes significant computational resources. Because 
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detector noise is generally non-Gaussian, it is difficult to 
know if an event in one detector is signal or noise. For 
this reason, multiple detectors are required to perform 
gravitational-wave searches. To estimate background, 
these searches time-shift the data of each detector with 
respect to the other(s), by some unphysical delay which 
is larger than the light travel time between the detectors. 
The coherent statistics are then computed between the 
time-shifted data in the same way as the original search 
algorithm, and in this way, false alarm rates can be es¬ 
timated. Gurrent searches use thousands (or more) of 
time-slides [3l[ll|23]. The main limitation on the num¬ 
ber of time-slides that can be performed is limited com¬ 
putational resources, although the short-duration coher¬ 
ent burst pipelines have been tested in the ten-thousand 
time-slides regime and are mvoing towards the hundred- 
thousands while the compact binary matched filtering 
pipelines have successfully generated 5a background dis¬ 
tributions. Th difficulty in reaching these levels is due to 
computationally intensive calculations like the matched 
filter used in compact binary coalescence [23], calcula¬ 
tion of the coherent SNR [53] used in burst searches, and 
potentially seedless clustering flHHT!)] in a long-duration 
transient search. 

Was et al. demonstrated the limitation of using time 
slides to perform background estimation in the single¬ 
detector trigger case (this is not generally a problem for 
coherent analyses) [26| El] • They showed that the pre¬ 
cision on the background estimation using time slides of 
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trigger streams is in fact limited and that the variance 
associated with their use saturates at some point. The 
computational limitations and the potential problems 
with time-slides motivate a search for potential alter¬ 
native forms of background estimation in gravitational- 
wave searches. Gravitational-wave searches for isotropic 
stochastic gravitational wave backgrounds [IHl HH] and 
directional searches towards Sco X-1, the galactic cen¬ 
ter, and SN1987A EO] have assumed that the detection 
statistic is normally distributed with a known mean and 
variance that can be calculated from first principles when 
performing the searches. These searches sum up data 
from long stretches of time, and combined with the use 
of long time segments (60s) and Gaussianity cuts, these 
statistics are Gaussian by the Gentral Limit Theorem. 
This has the significant computational cost-saving ben¬ 
efit of not requiring time slides to perfom the search, 
although limited time-shift analyses are used as sanity 
checks and to ensure that particularly non-Gaussian fre¬ 
quency bins can be removed from the analysis. 

Some searches for long-duration gravitational-wave 
bursts use the same cross-correlation technique as 
stochastic searches m. although other methods exist 
[m 122]. They utilize cross-power spectrograms, com¬ 
puted from the cross-correlation of two gravitational- 
wave detectors, and use pattern recognition algorithms to 
search for clusters of excess strain cross-power [15]. One 
algorithm used to search for long-duration gravitational 
waves is “seedless clustering,” which integrate along 
many different paths in spectrograms. This algorithm 
is sensitive to signals that can be well-approximated by 
parameterized curves, and the advantage of seedless clus¬ 
tering is most pronounced for long and weak signals Ha¬ 
ul]. We have previously shown how seedless clustering 
algorithms can be used to significantly enhance the sen¬ 
sitivity of searches for signals of this type [H] . Although 
seedless clustering algorithms are “embarrassingly paral¬ 
lel” |31j . and therefore computations can be performed 
on graphical processor units, seedless clustering searches 
are still limited by computation of the noise background. 

Gannon et al. [32] recently proposed a method to esti¬ 
mate the false alarm probability of compact binary coa¬ 
lescences without time-shifts. They rely on the approx¬ 
imation of compact binary events as a Poisson process. 
This in particular allows for a statistical detection of a 
population of events, which could be collectively more 
significant than the single most significant event alone. 
The method proposed in our paper is similar in that we 
measure events based on the data and then use a statis¬ 
tical approximation to the distribution of the measured 
tracks to make approximations to the noise background. 
There are also a number of notable differences. Because 
long-duration transient gravitational waves are typically 
searched for using a coherent combination of detector 
data, the trigger distributions no longer obey Poisson 
statistics. Instead, we will exploit the fact that seed¬ 
less clustering sums many approximately statistically in¬ 
dependent pixels to use Gaussian statistics to estimate 


the background. In this paper, we demonstrate a semi- 
analytical approximation to the seedless clustering out¬ 
put from cross-correlation spectrograms. One potential 
criticism of the analysis that follows is the fact that we 
compare the approximation with data from time-shifted 
analyses out to « 3a, not to the 5a distributions we 
present at the end of the paper. This is a “chicken-or- 
the-egg” problem in the following sense: in order to truly 
verify the approximation we use, it would be necessary to 
perform 5a worth of time-shifts. This calculation is cur¬ 
rently very difficult to do computationally, and of course, 
if we could perform 5a worth of time-shifts, we would not 
need an approximation in the first place. Moreover, as 
we perform the analysis using a relatively clean week- 
long stretch of data, different sets of data could result in 
different results. Therefore, we consider the analysis that 
follows as a first test for the feasibility of an approximate 
method. As argued above, we expect the background dis¬ 
tributions to be better behaved in long-duration analyses 
than in short-duration searches, and therefore perhaps 
less susceptible to significant deviations from empirical 
distributions. In the future, we can use distributions gen¬ 
erated by future analyses that perform more time slides 
and over longer periods to compare against the approxi¬ 
mation to test its utility. Therefore, although time-slides 
are likely required to create confidence in a detection 
due to the problem just described, we now summarize 
several reasons why it is useful to consider alternative 
significance-estimation strategies. 

Algorithm Verification. The semi-analytic method pro¬ 
vides a verification for the pipeline in multiple ways. 
In the case where data-quality work is being performed 
correctly, in general, the data should be generally well- 
approximated by Gaussian noise, outside of some data 
transients which pass the data quality cuts. Therefore, 
background estimation should approximately follow the 
distribution if it is assumed that the data is Gaussian. 
Also, this provides a sanity check that the algorithm per¬ 
forms as expected on the data. By performing a limited 
number of time-slides or running on Gaussian noise, it 
should be clear that the model for the algorithm is cor¬ 
rect, which can provide confidence that the algorithm is 
performing as expected (or not). 

Sensitivity to waveform models. There are a num¬ 
ber of papers contained in the literature about the sen¬ 
sitivity of gravitational-wave detectors to long-duration 
gravitational waves HI [MIS]. In general, the sensitiv¬ 
ity studies have been performed by running the analysis 
on 1,000 /t-maps to reach a FAP of 0.1%, and the sen¬ 
sitivity to various waveform models are computed rela¬ 
tive to this number. For a year of data, assuming ft- 
maps of 250 s with 50% overlap, and a desirable FAP 
of « 3 cr or FAP = 0.27%, there will be more than 10® 
maps analyzed. Before any analysis, either a search for 
gravitational-waves or a waveform sensitivity study, is 
performed, it is desirable to be able to estimate the back¬ 
ground quickly. This estimate informs expectations of 
potential results as well as how to setup the analysis. 


3 


Using the method described in this paper, we can ana¬ 
lytically compute what we expect a threshold based on 
this number of maps, without needing to run nearly so 
many time-slides. 


Event follow-up and electromagnetic alerts. There are 
preparations for joint electromagnetic and gravitational- 
wave observations in the advanced detector era [33] . Low- 
latency gravitational-wave searches are aiming for run¬ 
times < 1 min. For significance estimates on this time- 
scale, rapid background estimation techniques are re¬ 
quired. The method described in this paper is able to give 
an approximate FAR for any event on this time-scale. In 
the case where there is eventually interest in joint electro¬ 
magnetic and gravitational-wave observations for generic 
long-duration transients, this method may be useful for 
making that happen. 


The remainder of the paper is organized as follows. In 
Sec. we describe the formalism of an all-sky transient 
search and seedless clustering. In Sec. |IIH we present the 
results of a Monte Carlo and time-shifted study compar¬ 
ing the semi-analytical model to seedless clustering. In 
Sec. IV we discuss our conclusions and suggest directions 
for future research. 


II. FORMALISM 


pixels as a sum of over p(t; 

SNRtot = ^ E ( 2 ) 

{t;/}er 

where N is the number of pixels in F, which is chosen 
from a bank of parametrized frequency-time tracks, and 
each track is referred to as a “template.” 

To modify the above algorithm to perform an all-sky 
search [iiHini, we use a “complex signal-to-noise ratio”: 




2^2 ~s}{t;f)Sj{t,f) 


(3) 


This statistic preserves the complex phase information, 
which encodes the direction of the source. As a proxy for 
the sky location, which is unknown, we add an additional 
variable Ar which corresponds to the time delay between 
the detectors Therefore, we rewrite Eq. as 

SNRtot^^ E Re[e2"^^"p(t;/)], (4) 

{*;/}er 


and this sum is carried out for many randomly selected 
clusters F. 


We use the cross-correlation of two GW strain channels 
from spatially separated detectors to perform searches 
for long-duration GW transients. We construct /t-maps 
of cross-power signal-to-noise ratio. We divide detec¬ 
tor strain time series into segments and compute Fourier 
transforms of the segments to create the pixels, which 
we denote as s/(t;/), where we take strain data from 
detector I for the segment with a mid-time of t. Follow¬ 
ing [TBHro] , the segments are 50%-overlapping and Hann- 
windowed with duration of 1 s and a frequency resolution 
of 1 Hz. 

The expression for the cross-power signal-to-noise ratio 
is as follows |15| : 

fAS-h/c s*jit;f)sj{t;f) 

( 1 ) 

where Ax is a vector describing the relative displacement 
of the two detectors, H is the direction of the GW source, 
and c is the speed of light. The time delay between the 
two detectors, which is a direction-dependent phase fac¬ 
tor, is in the Pj[t\ f) and Pj{t; f) are 

the auto-power spectral densities for detectors I and J in 
the segments neighboring t. W is a FFT normalization 
factor, L x Fs, where L is the length of data in seconds 
and Fs is the sampling frequency. Additional details can 
be found in 

We write the total signal-to-noise ratio for a cluster of 


A. Parameterizations 

In any seedless clustering algorithm, F is chosen such 
that it is sensitive to the morphology of the the gravi¬ 
tational waves being searched for. There are two types 
we will consider in this paper, although the method is 
generic enough to work for any parameterization. 

Bezier curves. For generic narrow-band long transient 
gravitational waves [min], F is chosen randomly from 
the set of quadratic Bezier curves [34] subject to the con¬ 
straint that the curve persists for a duration tmin- Three 
time-frequency control points determine the template: 

Bq (Gtart 5/"start) ; Bi (tmid5/mid)5 and P2 (tend)/end)5 and 

the curve is parameterized by ^ = [0,1]: 

+ 2(1 - O^Pi + eP2. (5) 

These arrays allow the sum in Eq. to be computed for 
a large number of templates in parallel. For practical ap¬ 
plications, the number of templates T is typically chosen 

to be T = 0(10^-10®). To perform a computationally 

feasible all-sky analysis, T = 2 x 10"* templates are feasi¬ 
ble, and we use this number in the analysis that follows. 

Post-Newtonian templates for compact binary coales¬ 
cences. Another parameterization for F currently in the 
literature creates templates based on a post-Newtonian 
model for chirp-like signals created by circular compact 
binary coalescences [18] . For searches for compact binary 
coalescences with seedless clustering, we can use a more 
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specialized template bank consisting of parametrized 
chirps: 


f{t) = 


1 c3 
2 ^ 4 GMtotai 


/c=0 


( 6 ) 


where 


rjc^jtc - t) 

SGMtotal 


( 7 ) 


where the expansion coefficients pk can be found in |35j , 
G is the gravitational constant and Mtotai is the total 
mass of the binary. These templates are parameterized 
by the coalescence time and the chirp mass. It was shown 
in [18] that while the waveform depends on the individual 
component masses, the main features of the signal can be 
well-approximated by only the chirp mass, and we can 
approximate that the individual component masses are 
equal. 


B. Semi-analytical approximation 

We now describe an semi-analytical approximation to 
the background of our seedless clustering algorithms. 
Seedless clustering, which computes the sum of pixels in a 
track, divided by the square root of the number of pixels 
in the track, lends itself to modeling due to its simplic¬ 
ity. By the central limit theorem, the sum of a sufficiently 
large number of independent random variables, each with 
a well-defined expected value and well-defined variance, 
will be approximately normally distributed. Therefore, 
we expect that the sum of many pixels will approach a 
normal distribution, given by 


PDFNormal(2) = (8) 

Because seedless clustering measures the maximum 
SNRtot of many tracks, here we seek the extreme value 
distribution for SNRtot- This is motivated by the de¬ 
sire for a distribution with which to compare those mea¬ 
sured from an analysis using the algorithm. We can 
analytically compute a probability distribution for this 
maximum value as follows. Given a random sample 
of SNRtotS {Xi,X n), from a continuous distribution 
with a probability density function f{x) and cumulative 
density function F(x), the cumulative density function 
of the maximum of SNRtot is then given by 


CDFMax[SNRtotl(^) = -P(max(W) < z) 

= P{Xi < z ,..., Xn <z) = P{Xi < z)...P{Xn < z) 

( 9 ) 

where the third equality assumes that the random sam¬ 
ples are independent. We will discuss the effect of this 


assumption in section III B In the case where the prob¬ 
ability density functions are identical, the equation be¬ 
comes 


CDFMax[SNR.„q(^) = [P{X < Z)f = [Fx{z)f ■ (10) 

We show below that we can use this equation, where 
the CDFs are given by Gaussian GDFs, to approximate 
the seedless clustering distributions. Even though in our 
case Fx{z) is derived from a Gaussian distribution, equa- 
tion[^is true for any general distribution represented by 
Fx{z). Hence in cases where the analytic expression for 
Fx{z) is difficult to derive or approximate, one can use 
the observed distribution. 


III. BACKGROUND STUDY 

We can test the approximations by performing the 
analysis on Monte Garlo Gaussian noise and initial LIGO 
noise from the Hanford, WA (HI) and Livingston, LA 
(LI) detectors. We create complex signal-to-noise ra¬ 
tio spectrograms p(t;/) using Eq. and analyze each 
with the various seedless clustering algorithms. Follow¬ 
ing [m [T7], we create 250 s maps in a band between 
100-250 Hz with spectrogram resolution of 1 s x 1 Hz us¬ 
ing 50%-overlapping Hann windows. The results for each 
are as follows. 


A. Bezier parameterization 


We begin by analyzing the performance of the analytic 
model on the Bezier parameterization. 

We run the seedless clustering algorithm over hundreds 
of these maps and save SNRtot for each of the tracks in 
the map. The left of Fig shows a histogram of the 
resulting SNRtotS for both the Monte Garlo and initial 
LIGO data. We fit Eq. to the resulting distributions. 
We find best fits of p = 0.0007 and a = 0.99. The fact 
that the distribution has approximately a mean of zero 
and a standard deviation of 1 is expected based on the 
fact that p has a mean of 0 and we use the ^/N nor¬ 
malization in the SNRtot calculation. We find that the 
agreement is reasonable out to the tails of the distribu¬ 
tion. The right of figure shows the standard deviation 
of the distribution as a function of track length. The 
standard deviation differs on the order of a few percent 
across the track lengths considered. For the sake of sim¬ 
plicity, we assume that the distribution is approximately 
independent of track length. 

We now simulate an all-sky search by performing 100 
time-slides in a week of data. The data are processed 
with a glitch identification [35] cut as if it were a real 
analysis. In order to apply the algorithm from [36], we 
assume that the source is optimally oriented with an 
optimal sky position. To compare this to the analytic 


approximation in equation 10 we use the Gaussian fit 
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FIG. 1: The plot on the left is the background distribution for the seedless clustering algorithm cluster SNR defined in 
equation]^ Monte Carlo denotes Gaussian colored noise. Time-shift denotes time-shifted data with vetoes to limit the effects 
of instrumental artifacts. The theoretical line corresponds to the Gaussian approximation to the distribution given by Eq. 
The plot on the right is the standard deviation of SNRtot as a function of track length. The standard deviation differs on the 
order of a few percent across the track lengths considered. In our analysis, we approximate the standard deviation of SNRtot 
across track length as constant. 


shown in Fig to approximate the SNRtot distribution. 
The steps required to turn the SNRtot distribution into a 
p-value vs. SNR distribution are as follows. To generate 
a SNRtot value for a single simulated “map,” we generate 
N random numbers consistent with the Gaussian distri¬ 
bution of mean and variance as estimated above. We 
then take the maximum value of these values to compute 
the Max[SNRtot]- To generate a p-value vs. SNR distri¬ 
bution, Max[SNRtot] is generated for M instances, where 
M is the number of instances required. Max[SNRtot] is 
placed in ascending order. The p-value is calculated as 
an array between I/M and 1 with spacing given by 1/M. 
For a Gaussian distribution where the mean and stan¬ 
dard deviation are the same across all trials, this process 
can also be performed analytically by simply computing 
equation |10| for the measured distribution. 

We perform two search simulations using the Bezier 
parameterization. The first uses Bezier templates com¬ 
puted for a specihc search direction. The second loops 
over time-delays for each template. By searching over 40 
different time-delays, corresponding to 40 different sky 
rings, a computationally efficient all-sky search can be 
performed. This was demonstrated in [18] to be suffi¬ 
cient to recover signals in arbitrary directions. This in¬ 
volves a rotation in the complex plane of the individual 
pixels that make up the tracks. This creates difficulty for 
the analytic analysis. Because the analysis amounts to a 
rotation, the 40 time-delays do not correspond to 40 inde¬ 
pendent trials (which would simply multiply the number 
of tracks by 40). One possibility would be to measure the 
covariances between the rotated pixels. This situation 


is similar to |30j . In this work, the authors place lim¬ 
its on gravitational-wave strain from different portions 
of the sky. This was difficult because the distribution 
of maximum SNR for a sky map contains non-zero co- 
variances that exist between different pixels (or patches) 
on the sky. They simulate the covariance between pixels 
numerically, by simulating many realizations that have 
expected covariances described by the Fisher matrix. In 
this case, we could numerically compute a covariance ma¬ 
trix, which we can diagonalize to create a basis of non¬ 
covariant variables. Then, one would generate random 
realizations of these non-covariant variables and use the 
covariance matrix to convert them into a set of randomly 
generated covariant variables. One difficulty is that the 
distribution of the non-covariant variables might not be 
the same as the covariant variables. With this method, 
we could determine the set of covariant variables which 
describe the distribution of SNRtot- 

The top left of Fig [^demonstrates the analysis for the 
first simulation using 20,000 tracks, showing both empir¬ 
ical time slides as well as the theoretical approximation 
method explained as before (both the 10th, 50th, and 
90th percentiles). We find excellent agreement between 
the analytic model and empirical time-slides. There are a 
number of reasons we might expect small disagreements 
between the theoretical model and the empirical results. 
First of all, generating purely random numbers to ap¬ 
proximate SNRtot is an approximation. This is because 
the tracks are analyzed on the same map, and there¬ 
fore overlapping tracks will have correlated SNRtot val¬ 
ues. Another implicit assumption is that the pixels in 
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the tracks are uncorrelated. This assumes that the noise 
is Gaussian and stationary and ignores the correlation 
between pixels in the maps. However, the cross-power 
statistic uses PSD’s from adjacent pixels (in the time di¬ 
rection) to estimate a (from Eq. (§). This is to avoid a 
bias in pixel SNR for an isolated loud pixel. This means 
there is a correlation between adjacent pixels. A simi¬ 
lar concern arises from the fact that real detectors have 
noise transients and non-stationary noise, which violate 
some of the approximations used here. The top right of 
Figi demonstrates the analysis for the second simula¬ 
tion using 20,000 tracks as well. In this case, we multi¬ 
ply the number of templates by 40, corresponding to the 
40 time-delays. We expect minor disagreement, which 
is present, due to the conservative assumption that each 
loop corresponds to an independent trial. We show the 
Max[SNRtot]s required for a 5-sigma gravitational-wave 
detection using the Bezier parameterization in figure 

B. Compact binary coalescence parameterization 

We now analyze the performance of the analytical 
model on the chirp templates. We create maps assuming 
Gaussian noise consistent with the design sensitivity of 
Advanced LIGO. Following [18], we create 660 s maps in 
a band between 10-150 Hz with a spectrogram resolution 
of 1 s X 1 Hz. 

We perform a similar analysis to the above. We find 
best fits for the Gaussian distribution of /i = —0.004 and 
cr = 1.06. The major difference between the Bezier and 
chirp-like template analysis is the degree of correlations 
between the drawn tracks. In the Bezier case, the tracks 
are drawn randomly and the degree of correlation is sim¬ 
ply determined by the overlap between the tracks. In the 
chirp-like template case, the degree of correlation is much 
higher, despite the significantly fewer templates used in 
the analysis. This correlation arises from the step in time 
and overlap in parameter space between chirp-like tem¬ 
plates of similar chirp mass. 

This correlation changes the standard deviation of the 
SNRtot of the tracks in individual maps. Fig [^demon¬ 
strates the cumulative density function of the standard 
deviation of the SNRtot for the two parameterizations. 
In the Bezier case, the standard deviation of SNRtot is 
approximately a step function, which allows for the use 
of a single standard deviation to cover all cases. The dis¬ 
tribution is significantly broader for chirp-like templates 
due to track correlations. It is for this reason that we 
modify the Bezier p-value algorithm by drawing from 
the measured distribution of standard deviations of the 
maps when drawing from the Gaussian distribution. The 


bottom of Fig [^ demonstrates the algorithm using both 
empirical Monte carlo noise as well as the theoretical ap¬ 
proximation method explained as before (both the 10th, 
50th, and 90th percentiles) for both a directed and all¬ 
sky search. We show the Max[SNRtot]s required for a 
5-sigma gravitational-wave detection using the chirp-like 
parameterization in figure [^ 


IV. CONCLUSIONS AND FUTURE WORK 

In previous work, we showed how a seedless cluster¬ 
ing algorithm could significantly improve the sensitivity 
of searches for long-lived, unmodeled gravitational-wave 
transients [T6lH9j . Here, we show how the simplicity of 
the search statistic allows for the development of a semi- 
analytic approximation to the background generated by 
the algorithm and compared the performance using a 
week of LIGO S5 time-shifted data. We described algo¬ 
rithmic subtleties not addressed by this model and quan¬ 
tify the errors between the model and the measured dis¬ 
tribution. We argued that it will be useful for pipeline 
characterization, as well as potentially for low-latency 
FAP reporting for gravitational-wave searches. 

In the future, we will move beyond the simple models 
presented here to more complicated models. Some ex¬ 
amples could be using non-Gaussian distributions, such 
as the Student-t distribution, to better approximate the 
tails of the distribution, which is where we expect the 
strongest disagreement. Other ideas include using the 
Edgeworth expansion to put bounds on the deviation 
from Gaussianity. As the tracks in individual maps are 
correlated (due to the fact that some will overlap and use 
the same pixels), we could also consider generating cor¬ 
related random values when generating our distributions 
for SNRtot- 
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